Web Page Classification: a Semantic Analysis

نویسندگان

  • Rocío Vargas Arroyo
  • Azucena Montes Rendón
چکیده

In this paper, a semantic analysis for Web page classification is presented. A set of Web pages, resulting from a simple query to a Web browser, is categorized by disambiguating the meaning of the term used for the search. The disambiguation process begins with the isolation of some outstanding paragraphs; linguistic markers are used to accomplish this task. The search term is located within the paragraphs and the Contextual Exploration Method is used to identify words that lead to the discovery of relationships within an Ontology. Finally, the discovered relationships are used for assigning the web page to a category.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web page classification based on a support vector machine using a weighted vote schema

Traditional information retrieval method use keywords occurring in documents to determine the class of the documents, but usually retrieves unrelated web pages. In order to effectively classify web pages solving the synonymous keyword problem, we propose a web page classification based on support vector machine using a weighted vote schema for various features. The system uses both latent seman...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

Semantic similarity based web document classification using support vector machine

With the rapid growth of information on the World Wide Web (WWW), classification of web documents has become important for efficient information retrieval. Relevancy of information retrieved can also be improved by considering semantic relatedness between words which is a basic research area in fields of natural language processing, intelligent retrieval, document clustering and classification,...

متن کامل

Web Page Structure Enhanced Feature Selection for Classification of Web Pages

Web page classification is achieved using text classification techniques. Web page classification is different from traditional text classification due to additional information, provided by web page structure which provides much information on content importance. HTML tags provide visual web page representation and can be considered a parameter to highlight content importance. Textual keywords...

متن کامل

Experiments in Web Page Classification for Semantic Web

We address the problem of web page classification within the framework of automatic annotation for Semantic Web. The performance of several classification algorithms is explored on the Four Universities dataset using page text and link information, with a limited-size feature set. Several well-known classification algorithms are evaluated on the task of web-page classification using the text of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006